This document walks through some of our findings as well as the steps I took to get there. I’ve got two main buckets of data for you: nonprofits by geography, and nonprofits by category.
We have some more data elements to add, though I think this is a good place to pause and circle up to make sure I’ve got things right, since there are a lot of ways to slice and dice this data.
This is a more “gritty” walkthrough than I would usually provide, and I created it this way because A) it’s easiest for me to keep the documentation and results in the same place, and B) we’re going to make some decisions together about the data, so I wanted to give you all the info I have.
(If you’re feeling particularly adventurous, you can click on the “code” buttons to the right for a quick tutorial on how to do my job. You, too, can be a data scientist–just gotta copy and paste what’s in those blocks.)
This section runs through the number of nonprofits by geography. I’m going to include financial info as well, but I want to check which nonprofits we want those numbers for. Probably public charities?
We start by figuring out which organizations are in Louisville. We’ll use data from the IRS Business Master File (downloaded 6/9/2022). This document contains a full listing of tax-exempt organizations across the county, as well as some basic financial info (revenue, expenses, and assets).
To subset the data to the Louisville area, we start by filtering the data to organizations with a zip code within the Louisville MSA. Zip codes cross county and MSA boundaries, so some of these organizations lie outside the MSA, and we’ll need to remove them.
About 15% of organizations have a PO box as their primary address. We include them in our list if more than 50% of their zip code is within the Louisville MSA. (We calculate the percent of the zip code in the Louisville MSA as the percent of the businesses in that zip code that are in the Louisville MSA.) For those in the Louisville MSA, we assign them to the county which contains the largest amount of that zip code.
The remaining 85% of businesses have a street address. For these organizations, we find the organization’s longitude and latitude using the Census Bureau’s address geocoding service, and we fill in missing values using the private service Geocodio. We then compare their latitude and longitude to a map of counties in the Louisville are to decide whether they are in the Louisville MSA and to decide which county they are located in.
While we’re working with our map data, we also assign nonprofits to neighborhoods and Metro Council Districts. As mentioned before, zip codes are included in the original dataset.
The map below shows the original list, as defined by zip codes. We remove the organizations represented by the blue dots, keeping only those in grey within the Louisville MSA boundaries. You can mouse over the list to see the organizations’ names.
# Read in IRS Business Master files and combine into one data frame
bmf1 <- read_csv("eo1.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf2 <- read_csv("eo2.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf3 <- read_csv("eo3.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf4 <- read_csv("eo4.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf <- bind_rows(bmf1, bmf2, bmf3, bmf4)
rm(bmf1, bmf2, bmf3, bmf4)
ntee_categories <- read_csv("NTEE categories.csv", col_types = "ccc")
ntee_codes <- read_csv("NTEE codes.csv", col_types = "ccc")
subsection_codes <- read_csv("Subsection codes.csv", col_types = "ccc")
# https://github.com/Nonprofit-Open-Data-Collective/irs-exempt-org-business-master-file#activity-codes
activity_to_ntee <- readxl::read_excel("Activity to NTEE.xlsx", sheet = 2)
ntee_categories %<>%
rename(
NTEE_cat = `NTEE Code`)
ntee_codes %<>%
rename(
NTEE_code = `NTEE Code`)
activity_to_ntee %<>%
transmute(
ACTIVITY = str_pad(ACTIVITY, 3, "left", "0"),
NTEE_replacement = if_else(!is.na(NTEE_replacement), NTEE_replacement, NTEE))
bmf %<>%
mutate(
EIN,
name = NAME,
street = STREET,
city = CITY,
state = STATE,
zip = str_sub(ZIP, 1, 5),
tax_year = str_sub(TAX_PERIOD, 1, 4) %>% as.numeric(),
creation = RULING,
filing_type = if_else(PF_FILING_REQ_CD == 1, "990PF",
if_else(FILING_REQ_CD %in% c(1, 3, 4), "990",
if_else(FILING_REQ_CD %in% c(2), "990N",
"other"))),
revenue = REVENUE_AMT,
assets = ASSET_AMT,
activity_codes = ACTIVITY,
NTEE_code = NTEE_CD,
NTEE_cat = str_sub(NTEE_CD, 1, 1))
# MSA_zip_crosswalk <- glptools::MSA_zip %>%
# filter(MSA == "31140") %>%
# mutate(business_in_county = round(business_in_county, 1))
MSA_zip_crosswalk <- glptools::MSA_zip %>%
filter(MSA == "31140")
bmf_lou <- bmf %>%
left_join(MSA_zip_crosswalk, by = "zip") %>%
filter(!is.na(MSA)) %>%
select(EIN, street, city, state, zip)
# bmf_sure <- bmf %>%
# filter(business_in_county == 100) %>%
# mutate(lou_msa = "sure")
#
# bmf_check <- bmf %>%
# filter(business_in_county > 0, business_in_county < 100) %>%
# mutate(lou_msa = "unsure")
#
# output <- bind_rows(bmf_sure, bmf_check)
#write_csv(output, "Louisville Nonprofits 6_9_2022.csv")
# Remove PO Boxes as they will not be addressed
bmf_lou_po <- bmf_lou %>%
filter(str_detect(street, "PO BOX|PO B0X"))
bmf_lou_remaining <- bmf_lou %>%
anti_join(bmf_lou_po, by = c("EIN", "street", "city", "state", "zip"))
bmf_lou_po %<>%
left_join(MSA_zip_crosswalk, by = "zip") %>%
filter(business_in_county >= 50) %>%
select(EIN, street, city, state, zip)
# Geocode using the Census Bureau geocoder
bmf_lou_census <- bmf_lou_remaining %>%
geocode(
street = street,
city = city,
state = state,
postalcode = zip,
method = "census")
bmf_lou_remaining <- bmf_lou_census %>%
filter(is.na(lat)) %>%
select(-lat, -long)
bmf_lou_census %<>%
filter(!is.na(lat))
# Geocode using the geocodio geocoder
Sys.setenv(GEOCODIO_API_KEY = "cccff3c3cc3aca633fc09ccc3901c1a861a9069")
bmf_lou_geocodio <- bmf_lou_remaining %>%
geocode(
street = street,
city = city,
state = state,
postalcode = zip,
method = "geocodio")
# Bind geocoded information and save
bmf_lou_geocoded <- bind_rows(bmf_lou_census, bmf_lou_geocodio, bmf_lou_po)
bmf_lou_geocoded %<>%
select(EIN, lat, long)
save(bmf_lou_geocoded, file = "bmf_lou_geocoded.RData")
load("bmf_lou_geocoded.RData")
# For PO boxes, assign EINS to the county that contains the majority of the zip code
bmf_lou_POs <- bmf_lou_geocoded %>%
filter(is.na(lat)) %>%
left_join(bmf, by = "EIN") %>%
select(EIN, zip) %>%
left_join(FIPS_zip_full_MSA, by = "zip") %>%
group_by(EIN) %>%
arrange(desc(business_in_county)) %>%
filter(row_number() == 1) %>%
ungroup() %>%
select(EIN, FIPS)
# For other addresses, create sf object and join with an MSA and neighborhood map
bmf_lou_addressed <- bmf_lou_geocoded %>%
filter(!is.na(lat)) %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
bmf_lou_county <- bmf_lou_addressed %>%
st_join(map_msa_lou, st_within)
bmf_lou_nh <- bmf_lou_addressed %>%
st_join(map_nh, st_within) %>%
st_drop_geometry()
bmf_lou_district <- bmf_lou_addressed %>%
st_join(map_district, st_within) %>%
st_drop_geometry()
pal <- colorFactor(palette = c("#00a9b7FF", "#333333"),
domain = c("Keep", "Drop"))
labels <- sprintf("%s",
left_join(bmf_lou_county, bmf, by = "EIN")$name) %>%
lapply(htmltools::HTML)
leaflet() %>%
addPolygons(data = map_msa_lou,
color = "black",
opacity = 1,
weight = 3,
fillOpacity = 0) %>%
addCircleMarkers(data = mutate(bmf_lou_county, keep = if_else(is.na(FIPS), "Drop", "Keep")),
radius = 2,
color = ~pal(keep),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal, values = c("Drop", "Keep")) %>%
addTiles() %>%
setView(-85.6, 38.2, zoom = 8)
rm(pal)
# Filter to EINs in Lou MSA and remove county polygons from the data frame
bmf_lou_county %<>%
st_drop_geometry() %>%
filter(!is.na(FIPS)) %>%
select(-county)
# Add neighborhood names to the data frame
bmf_lou_county %<>%
left_join(bmf_lou_nh, by = "EIN") %>%
left_join(bmf_lou_district, by = "EIN")
# Add PO box data
bmf_lou_county %<>%
bind_rows(bmf_lou_POs)
bmf_lou <- bmf_lou_county
bmf_lou_points <- bmf_lou_addressed %>%
filter(EIN %in% bmf_lou$EIN)
rm(bmf_lou_geocoded, bmf_lou_county, bmf_lou_nh, bmf_lou_district, bmf_lou_POs, bmf_lou_addressed)
Jefferson County has the largest number of nonprofits, though Harrison County happens to have the most per capita! You can mouse over the map below to see the numbers for each county. Click on the “Table” tab to see the data in a table format and to download the information.
county_pop_df <- glpdata::population_msa_counties %>%
filter(year == 2019,
sex == "total",
race == "total")
county_df <- bmf_lou %>%
group_by(FIPS) %>%
count() %>%
ungroup() %>%
left_join(county_pop_df, by = "FIPS") %>%
left_join(MSA_FIPS_info, by = "FIPS") %>%
transmute(
County = county %p% if_else(str_sub(FIPS, 1, 2) == "18",
", IN", ", KY"),
Nonprofits = n,
`Nonprofits per 1000 Residents` = round(n / population * 1000, 1))
county_df %<>%
mutate(
county = str_extract(County, "^.*(?= County)")) %>%
left_join(map_msa_lou, by = "county")
pal <- leaflet::colorNumeric(
palette = RColorBrewer::brewer.pal(9, "BuPu"),
domain = range(county_df$`Nonprofits per 1000 Residents`))
labels <- sprintf("%s<br/>%s<br/>%s",
county_df$County,
"Nonprofit HQs per 1,000 residents: " %p% county_df$`Nonprofits per 1000 Residents`,
"Total Nonprofits: " %p% county_df$`Nonprofits`) %>%
lapply(htmltools::HTML)
leaflet(st_as_sf(county_df)) %>%
addPolygons(
color = "#444444", weight = 1,
smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.5,
fillColor = ~pal(`Nonprofits per 1000 Residents`),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addTiles
county_df %<>%
select(County, Nonprofits, `Nonprofits per 1000 Residents`)
county_df %>%
downloadthis::download_this(
output_name = "Nonprofits by County",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
apply_table_settings <- function(table) {
table %>%
opt_row_striping(row_striping = TRUE) %>%
opt_table_outline() %>%
tab_options(
table.font.size = px(12),
table.width = pct(50)) %>%
tab_style(
cell_text(
weight = "bold"),
cells_row_groups())
}
county_df %>%
gt() %>%
fmt_integer(columns = c(Nonprofits)) %>%
tab_header(title = "Nonprofits by county",
subtitle = "") %>%
apply_table_settings()
| Nonprofits by county | ||
|---|---|---|
| County | Nonprofits | Nonprofits per 1000 Residents |
| Clark County, IN | 485 | 4.1 |
| Floyd County, IN | 373 | 4.8 |
| Harrison County, IN | 239 | 6.0 |
| Washington County, IN | 139 | 5.0 |
| Bullitt County, KY | 243 | 3.0 |
| Henry County, KY | 63 | 4.0 |
| Jefferson County, KY | 4,554 | 5.9 |
| Oldham County, KY | 310 | 4.7 |
| Shelby County, KY | 216 | 4.6 |
| Spencer County, KY | 77 | 4.2 |
Unsurprisingly (to me, at least) Downtown has the largest concentration of nonprofit headquarters. Since Districts have approximately the same number of residents and I didn’t have the data on hand, I did not include per capita figures.
council_df <- bmf_lou %>%
group_by(district) %>%
count() %>%
ungroup() %>%
filter(!is.na(district)) %>%
transmute(
`Council District` = district,
Nonprofits = n)
council_df %<>%
left_join(map_district, by = c("Council District" = "district"))
pal <- leaflet::colorNumeric(
palette = RColorBrewer::brewer.pal(9, "BuPu"),
domain = range(council_df$Nonprofits))
labels <- sprintf("%s<br/>%s",
"District " %p% council_df$`Council District`,
"Nonprofit HQs: " %p% council_df$Nonprofits) %>%
lapply(htmltools::HTML)
leaflet(st_as_sf(council_df)) %>%
addPolygons(
color = "#444444", weight = 1,
smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.5,
fillColor = ~pal(Nonprofits),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addTiles
council_df %<>%
select(`Council District`, Nonprofits)
council_df %>%
downloadthis::download_this(
output_name = "Nonprofits by Metro Council District",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
council_df %>%
gt() %>%
fmt_integer(columns = c(Nonprofits)) %>%
tab_header(title = "Nonprofits by Council District",
subtitle = "") %>%
apply_table_settings()
| Nonprofits by Council District | |
|---|---|
| Council District | Nonprofits |
| 1 | 104 |
| 2 | 76 |
| 3 | 95 |
| 4 | 623 |
| 5 | 133 |
| 6 | 207 |
| 7 | 261 |
| 8 | 194 |
| 9 | 209 |
| 10 | 161 |
| 11 | 127 |
| 12 | 77 |
| 13 | 69 |
| 14 | 53 |
| 15 | 88 |
| 16 | 174 |
| 17 | 116 |
| 18 | 212 |
| 19 | 243 |
| 20 | 137 |
| 21 | 96 |
| 22 | 90 |
| 23 | 93 |
| 24 | 59 |
| 25 | 70 |
| 26 | 121 |
nh_pop_df <- glpdata::population_nh %>%
filter(year == 2019,
sex == "total",
race == "total") %>%
select(neighborhood, population)
nh_df <- bmf_lou %>%
group_by(neighborhood) %>%
count() %>%
ungroup() %>%
left_join(nh_pop_df, by = "neighborhood") %>%
transmute(
Neighborhood = neighborhood,
Nonprofits = n,
`Nonprofits per 1000 Residents` = round(n / population * 1000, 1))
make_map(rename(nh_df, neighborhood = Neighborhood),
"Nonprofits",
hover_name = "Nonprofit HQ",
units = "none")
nh_df %>%
downloadthis::download_this(
output_name = "Nonprofits by Neighborhood",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
nh_df %>%
gt() %>%
fmt_integer(columns = c(Nonprofits)) %>%
tab_header(title = "Nonprofits by Council District",
subtitle = "") %>%
apply_table_settings()
| Nonprofits by Council District | ||
|---|---|---|
| Neighborhood | Nonprofits | Nonprofits per 1000 Residents |
| Airport | 7 | NA |
| Algonquin-Park Hill-Park Duvalle | 44 | 2.8 |
| Buechel-Newburg-Indian Trail | 115 | 3.3 |
| Butchertown-Clifton-Crescent Hill | 147 | 6.5 |
| California-Parkland | 69 | 7.8 |
| Chickasaw-Shawnee | 86 | 4.8 |
| Downtown-Old Louisville-University | 505 | 28.9 |
| Fairdale | 17 | 1.2 |
| Fern Creek | 67 | 2.4 |
| Floyd's Fork | 228 | 4.6 |
| Germantown | 77 | 5.9 |
| Highlands | 151 | 7.1 |
| Highview-Okolona | 170 | 2.6 |
| J-Town | 318 | 5.8 |
| Northeast Jefferson | 736 | 6.2 |
| Phoenix Hill-Smoketown-Shelby Park | 163 | 15.9 |
| Pleasure Ridge Park | 88 | 2.1 |
| Portland | 59 | 5.8 |
| Russell | 52 | 5.0 |
| Shively | 116 | 3.7 |
| South Central Louisville | 79 | 3.2 |
| South Louisville | 154 | 3.0 |
| Southeast Louisville | 238 | 4.2 |
| St. Matthews | 142 | 6.8 |
| Valley Station | 60 | 2.1 |
| NA | 2,811 | NA |
(The table here is pretty ungainly to look at in my opinion, but you can download it.)
zip_pop_df <- glpdata::population_zip %>%
filter(year == 2019,
sex == "total",
race == "total") %>%
select(zip, population_total) %>%
distinct()
zip_df <- bmf_lou %>%
left_join(bmf, by = "EIN") %>%
group_by(zip) %>%
count() %>%
ungroup() %>%
left_join(zip_pop_df, by = "zip") %>%
transmute(
`Zip Code` = zip,
Nonprofits = n,
`Nonprofits per 1000 Residents` =
round(n / population_total * 1000, 1))
zip_df %>%
downloadthis::download_this(
output_name = "Nonprofits by Zip Code",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
# zip_df %>%
# gt() %>%
# fmt_integer(columns = c(Nonprofits)) %>%
# tab_header(title = "Nonprofits by Zip Code",
# subtitle = "") %>%
# apply_table_settings()
make_map(rename(zip_df, zip = `Zip Code`),
"Nonprofits",
hover_name = "Nonprofit HQ",
units = "none")
Deciding which organizations fit where is complicated, to say the least. We have a couple ways of approaching it:
bmf_lou %<>%
left_join(bmf, by = "EIN")
# Change AMVETS from subsection 91 to 19.
bmf_lou$SUBSECTION[bmf_lou$name == "AMVETS"] <- "19"
# Change Thornton to employee benefit trust
bmf_lou$SUBSECTION[bmf_lou$EIN == "611040959"] <- "09"
# Classify 501(c)(3) organizations as foundations, Churches/Religious orgs, Government entities, or charitable nonprofits
bmf_lou %<>%
left_join(subsection_codes, by = "SUBSECTION") %>%
mutate(
org_type = case_when(
SUBSECTION != "03" ~ `short name`,
PF_FILING_REQ_CD == 1 ~ "Foundation",
FILING_REQ_CD %in% c(6, 13) ~ "Church or Religious Organization",
FILING_REQ_CD == 14 ~ "Government Entities",
# other FILING_REQ_CD: 1, 2, 3
FILING_REQ_CD %in% 1:3 ~ "Charitable Nonprofit",
FOUNDATION %in% c(10) ~ "Church or Religious Organization",
FOUNDATION %in% c(4, 11, 16, 17) ~ "Foundation",
FOUNDATION %in% c(15) ~ "Government Entities"))
# Replace missing NTEE codes with an activity code to NTEE crosswalk
bmf_lou %<>%
mutate(activity_code_1 = str_sub(ACTIVITY, 1, 3)) %>%
left_join(activity_to_ntee, by = c("activity_code_1" = "ACTIVITY")) %>%
mutate(
NTEE_code = if_else(is.na(NTEE_code), NTEE_replacement, NTEE_code),
NTEE_cat = str_sub(NTEE_code, 1, 1))
# Add NTEE1 and NTEE2
bmf_lou %<>%
left_join(ntee_categories, by = "NTEE_cat") %>%
mutate(NTEE1 = Description) %>%
select(-Description, -Definition) %>%
left_join(ntee_codes, by = "NTEE_code") %>%
mutate(NTEE2 = Description)%>%
select(-Description, -Definition) %>%
mutate(NTEE2 = if_else(!is.na(NTEE1) & is.na(NTEE2), "uncategorized", NTEE2))
library(sunburstR)
library(d3r)
df <- bmf_lou_classified %>%
select(org_type, NTEE1, NTEE2, name, revenue) %>%
mutate(
org_type = str_replace(org_type, "-", " "),
NTEE1 = str_replace(NTEE1, "-", " "),
NTEE2 = str_replace(NTEE2, "-", " "),
name = str_replace(name, "-", " "),
revenue = if_else(revenue < 0, 0, revenue)) %>%
mutate(org_type = replace_na(org_type, "uncategorized"),
NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
count(org_type, NTEE1, NTEE2, name) %>%
rename(size = n)
tree <- d3_nest(df, value_cols = "size")
sunburst(tree, width="100%", height=600, legend = FALSE)
sund2b(tree, width="100%")
org_type1 <- bmf_lou_classified %>%
group_by(org_type) %>%
summarize(n = n(), .groups = "drop")
org_type2 <- bmf_lou_classified %>%
filter(org_type == "Charitable Nonprofit") %>%
group_by(NTEE1) %>%
summarize(n = n(), .groups = "drop")
org_type3 <- bmf_lou_classified %>%
filter(org_type == "Charitable Nonprofit") %>%
group_by(NTEE2) %>%
summarize(n = n(), .groups = "drop") %>%
mutate(NTEE_cat = str_sub(NTEE2, 1, 1)) %>%
left_join(ntee_categories, by = "NTEE_cat")
org_type1 <- bmf_lou_classified %>%
group_by(org_type) %>%
summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")
org_type2 <- bmf_lou_classified %>%
filter(org_type == "Charitable Nonprofit") %>%
group_by(NTEE1) %>%
summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")
org_type3 <- bmf_lou_classified %>%
filter(org_type == "Charitable Nonprofit") %>%
group_by(NTEE2) %>%
summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")
Our first way of separating out types of nonprofits is to look at thir subsection codes. We are primarily concerned with 501(c)(3)s, or charitable nonprofits, and all information from here on out will focus on those.
subsection_df <- bmf_lou %>%
group_by(SUBSECTION, `short name`) %>%
count() %>%
ungroup() %>%
transmute(
Subsection = SUBSECTION,
`Subsection name` = `short name`,
`Nonprofits` = n)
subsection_df %>%
downloadthis::download_this(
output_name = "Nonprofits by IRS Subsection",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
subsection_df %>%
gt() %>%
fmt_integer(columns = c(`Nonprofits`)) %>%
tab_header(title = "Nonprofits by county",
subtitle = "") %>%
apply_table_settings()
| Nonprofits by county | ||
|---|---|---|
| Subsection | Subsection name | Nonprofits |
| 01 | Credit Unions | 4 |
| 02 | Title-holding Organization | 10 |
| 03 | Charitable Nonprofits | 5,406 |
| 04 | Civic Associations | 259 |
| 05 | Labor and Agricultural Organizations | 114 |
| 06 | Business Associations | 235 |
| 07 | Social Clubs and Greek Organizations | 248 |
| 08 | Fraternal Societies | 184 |
| 09 | Employee Benefit Trusts | 15 |
| 10 | Fraternal Societies | 53 |
| 12 | Utilities | 16 |
| 13 | Cemeteries | 54 |
| 14 | Credit Unions | 7 |
| 15 | Mutual Insurance | 1 |
| 19 | Veteran Support Organizations | 82 |
| 92 | Trusts | 11 |
501(c)(3)s are further grouped by foundation codes. The biggest distinction this creates is between private foundations and publicly-supported charities. At this stage, we classify organizations into the following groups: * Foundations, * Churches * Schools * Hospital and Medical Research Organizations, * Government Entities * Public Charities
bmf_lou %<>%
mutate(
org_type = case_when(
SUBSECTION != "03" ~ `short name`,
FOUNDATION %in% c(2:4, 21) ~ "Foundation",
FOUNDATION == 10 ~ "Church or Religious Organization",
FOUNDATION %in% c(11, 13, 23) ~ "School",
FOUNDATION %in% c(12, 22, 24) ~ "Hospital or Medical Research Org",
FOUNDATION == 14 ~ "Government Entity",
FOUNDATION %in% 15:17 ~ "Public charity"))
# other FILING_REQ_CD: 1, 2, 3
# FILING_REQ_CD == 13 ~ "Church or Religious Organization",
# FILING_REQ_CD == 14 ~ "Government Entities"))
bmf_lou_foundation <- bmf_lou %>%
filter(SUBSECTION == "03") %>%
group_by(org_type) %>%
summarize(
number = n(),
revenue = sum(revenue, na.rm = T),
missing_revenue = mean(is.na(REVENUE_AMT)),
) %>%
ungroup() %>%
transmute(
`Organization Type` = org_type,
Number = number,
`Total Revenue` = revenue,
`Percent missing revenue` = missing_revenue)
bmf_lou_foundation %>%
downloadthis::download_this(
output_name = "501(c)(3)s by foundation code",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save",
csv2 = FALSE)
bmf_lou_foundation %>%
gt() %>%
apply_table_settings() %>%
fmt_currency(columns = vars(`Total Revenue`),
use_subunits = F) %>%
fmt_percent(columns = vars(`Percent missing revenue`),
decimals = 0) %>%
fmt_integer(columns = c(`Number`)) %>%
tab_header(title = "501(c)(3)s by Foundation Code",
subtitle = "")
| 501(c)(3)s by Foundation Code | |||
|---|---|---|---|
| Organization Type | Number | Total Revenue | Percent missing revenue |
| Church or Religious Organization | 943 | $22,941,639 | 95% |
| Foundation | 458 | $15,774,977 | 97% |
| Government Entity | 2 | $0 | 50% |
| Hospital or Medical Research Org | 51 | $6,848,285,186 | 47% |
| Public charity | 3,872 | $4,096,102,738 | 12% |
| School | 80 | $456,599,867 | 15% |
Having drilled down to something close to public charities, the next logical question is what is going on in the public charities / social services sector?
This table shows our public charities broken out by NTEE code. You can click on the arrow next to each title top open a sub-table with more detailed information.
Many organizations founded before the mid-90s are classified into a NTEE Category, but we are unsure which detailed code they belong to, so they end up in the “uncaegorized” category.
bmf_lou_table <- bmf_lou %>%
filter(SUBSECTION == "03", org_type == "Public charity") %>%
group_by(NTEE1) %>%
summarize(
number = n(),
revenue = sum(revenue, na.rm = T),
missing_revenue = mean(is.na(REVENUE_AMT)),
) %>%
ungroup() %>%
transmute(
`NTEE Category` = NTEE1,
Number = number,
Revenue = revenue,
`Pct missing revenue` = missing_revenue)
library(reactable)
bmf_lou_table %>%
reactable(
columns = list(
`NTEE Category` = colDef(format = colFormat()),
Number = colDef(format = colFormat(separators = TRUE)),
Revenue = colDef(format = colFormat(prefix = "$", separators = TRUE, digits = 0)),
`Pct missing revenue` = colDef(format = colFormat(percent = TRUE, digits = 1))),
details = function(index) {
this_data <- bmf_lou %>%
filter(SUBSECTION == "03", org_type == "Public charity", NTEE1 == bmf_lou_table$`NTEE Category`[index]) %>%
group_by(NTEE2) %>%
summarize(
number = n(),
revenue = sum(revenue, na.rm = T),
missing_revenue = mean(is.na(REVENUE_AMT))) %>%
transmute(
`NTEE Category` = NTEE2,
Number = number,
Revenue = revenue,
`Pct missing revenue` = missing_revenue)
htmltools::div(style = "padding: 1rem",
reactable(this_data, outlined = TRUE,
columns = list(
`NTEE Category` = colDef(format = colFormat()),
Number = colDef(format = colFormat(separators = TRUE)),
Revenue = colDef(format = colFormat(prefix = "$", separators = TRUE, digits = 0)),
`Pct missing revenue` = colDef(format = colFormat(percent = TRUE, digits = 1)))))
},
defaultPageSize = 30)
This interactive lets you drill down through the 26 NTEE categories and 655 NTEE codes to see organizations.
This graph includes all 3,693 public charities in Louisville. The inner-most purple ring represents public charities, the next ring shows the 26 NTEE categories, the third ring shows all of the 655 NTEE codes present in Louisville, and the final ring shows individual organizations.
A couple of organizations are represented several times in the data under different EIN numbers–something for me to look into.
The “breadcrum” trail in the upper right shows you what percentage of local nonprofits are in the category you mouse over. As you click on the rings, it will zoom to that particular ring. (To Zoom out, click on a ring in the middle that you want to Zoom back out to.)
library(sunburstR)
library(d3r)
df <- bmf_lou %>%
filter(SUBSECTION == "03", org_type == "Public charity") %>%
select(NTEE1, NTEE2, name, revenue) %>%
mutate(
NTEE1 = str_replace(NTEE1, "-", " "),
NTEE2 = str_replace(NTEE2, "-", " "),
name = str_replace(name, "-", " "),
revenue = if_else(revenue < 0, 0, revenue)) %>%
mutate(NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
group_by(NTEE1, NTEE2) %>%
count(NTEE1, NTEE2, name) %>%
rename(size = n) %>%
ungroup()
tree <- d3_nest(df, value_cols = "size")
sund2b(tree, width="100%", rootLabel = "Public Charities",
colors = htmlwidgets::JS("d3.scaleOrdinal(d3.schemeCategory20b)"))
This graph is similar to the last, though each organization’s size is relative to its’ last reported revenue. As a result, some individual organizations like the UL Research Foundation, Norton Healthcare, or UL Physicians Group make up an outsize proportion of the graph.
The “breadcrum” trail now shows the sum of reported revenue for these organizations: over $4 billion as of their last filings.
library(sunburstR)
library(d3r)
df <- bmf_lou %>%
filter(SUBSECTION == "03", org_type == "Public charity") %>%
select(NTEE1, NTEE2, name, revenue) %>%
mutate(
NTEE1 = str_replace(NTEE1, "-", " "),
NTEE2 = str_replace(NTEE2, "-", " "),
name = str_replace(name, "-", " "),
revenue = if_else(revenue < 0, 0, revenue)) %>%
mutate(NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
group_by(NTEE1, NTEE2, name) %>%
summarize(revenue = sum(revenue, na.rm = T), .groups = "drop")
tree <- d3_nest(df, value_cols = "revenue")
sund2b(tree, width="100%", rootLabel = "Public Charities", valueField = "revenue",
colors = htmlwidgets::JS("d3.scaleOrdinal(d3.schemeCategory20b)"))